Two-Dimensional Association Analysis For Finding Constant Value Biclusters In Real-Valued Data

نویسندگان

  • Gowtham Atluri
  • Jeremy Bellay
  • Gaurav Pandey
  • Chad Myers
  • Vipin Kumar
چکیده

Biclustering is a commonly used type of analysis for realvalued data sets, and several algorithms have been proposed for finding different types of biclusters. However, no systematic approach has been proposed for exhaustive enumerating all (nearly) constant value biclusters in such data sets, which is the problem addressed in this paper. Using a monotonic range measure to capture the coherence of values in a block/submatrix of an input data matrix, we propose a two-step Apriori-based algorithm for discovering all nearly constant value biclusters, referred to as Range Constrained Blocks (RCBs). By systematic evaluation on an extensive genetic interaction data set, we show that the submatrices with similar values represent groups of genes that are functionally related than the biclusters with diverse values. We also show that our approach can exhaustively find all the biclusters with a range less than a given threshold, while the other competing approaches can not find all such

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Association Analysis for Real-valued Data: Definitions and Application to Microarray Data

The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in several domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, the search schemes used by these algorithms are u...

متن کامل

Efficient Mining Differential Co-Expression Constant Row Bicluster in Real-Valued Gene Expression Datasets

Biclustering aims to mine a number of co-expressed genes under a set of experimental conditions in gene expression dataset. Recently, differential co-expression biclustering approach has been used to identify class-specific biclusters between two gene expression datasets. However, it cannot handle differential co-expression constant row biclusters efficiently in real-valued datasets. In this pa...

متن کامل

9 th International Workshop on Data Mining in Bioinformatics ( BIOKDD 2010 )

An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability t...

متن کامل

Efficient mining of maximal biclusters in mixed-attribute datasets

This paper presents a novel enumerative biclustering algorithm to directly mine all maximal biclusters in mixed-attribute datasets, with or without missing values. The independent attributes are mixed or heterogeneous, in the sense that both numerical (real or integer values) and categorical (ordinal or nominal values) attribute types may appear together in the same dataset. The proposal is an ...

متن کامل

Constrained Subspace Clustering for Time Series Gene Expression Data

For time series gene expression data, it is an important problem to find subgroups of genes with similar expression pattern in a consecutive time window. In this paper, we extend a fuzzy c-means clustering algorithm to construct two models to detect biclusters respectively, i.e., constant value biclusters and similarity-based biclusters whose gene expression profiles are similar within consecut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009